Automatic Recognition of Tibetan Buddhist Text by Computer

نویسندگان

Masami Kojima

Yoshiyuki Kawazoe

Masayuki Kimura

چکیده

The purpose of this study is to develop a plausible method to code and compile Buddhist texts automatically from original Tibetan scripts into the Romanized form. We extract syllable from Tibetan texts and recognize automatically the Tibetan characters. The set of Tibetan characters consists of basic 30 consonants, 76 combination characters, and 4 vowels. Despite of the limited number of Tibetan characters, there are many similar characters in shape. Therefore, to separately recognize them we apply an Object Oriented Dictionary ( OOD ) which is created combining the categorization and character identification procedures. From our experiment, it is confirmed that it is possible to improve the rate of Tibetan character recognition dramatically by Object Oriented Method [ Ref. 1,3 ]. We would like to express our opinion on automatic character recognition for wooden blocked Tibetan manuscripts.

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

A character recognition scheme based on object oriented design for Tibetan buddhist texts

The purpose of this study is to develop a plausible method to code and compile Buddhist texts from original Tibetan scripts into Romanized form. Using GUI (Graphical User Interface) based on Object Oriented Design, a dictionary of Tibetan characters can be easily made for Buddhist literature researchers. It is hoped that a computer system capable of highly accurate character recognition will be...

متن کامل

Off-line Arabic Handwritten Recognition Using a Novel Hybrid HMM-DNN Model

In order to facilitate the entry of data into the computer and its digitalization, automatic recognition of printed texts and manuscripts is one of the considerable aid to many applications. Research on automatic document recognition started decades ago with the recognition of isolated digits and letters, and today, due to advancements in machine learning methods, efforts are being made to iden...

متن کامل

The multiple pronunciations in Taiwanese and the automatic transcription of Buddhist sutra with augmented read speech

Collection of Taiwanese text corpus with phonetic transcription suffers from the problems of multiple pronunciation, or pronunciation variation. By further augmenting the text with read speech, and using automatic speech recognition with a sausage searching net constructed from the multiple pronunciations of the text corresponding to its speech utterance, we are able to reduce the effort for ph...

متن کامل

A survey on Automatic Text Summarization

Text summarization endeavors to produce a summary version of a text, while maintaining the original ideas. The textual content on the web, in particular, is growing at an exponential rate. The ability to decipher through such massive amount of data, in order to extract the useful information, is a major undertaking and requires an automatic mechanism to aid with the extant repository of informa...

متن کامل

Research on Tibetan Automatic Word Segmentation

This paper researches on Tibetan automatic word segmentation. We focus on three key technologies of Tibetan automatic word segmentation: (1) a Tibetan automatic word segmentation approach is proposed, which is taking the advantage of case-auxiliary words and continuous feature. (2) a resolution method of overlapping ambiguity in Tibetan word segmentation is proposed, which is based on forward-b...

متن کامل

ذخیره در منابع من

ذخیره در منابع من قبلا به منابع من ذحیره شده

{@ msg_add @}

با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

عنوان ژورنال:

دوره شماره

صفحات -

تاریخ انتشار 1999

Automatic Recognition of Tibetan Buddhist Text by Computer

نویسندگان

چکیده

منابع مشابه

A character recognition scheme based on object oriented design for Tibetan buddhist texts

Off-line Arabic Handwritten Recognition Using a Novel Hybrid HMM-DNN Model

The multiple pronunciations in Taiwanese and the automatic transcription of Buddhist sutra with augmented read speech

A survey on Automatic Text Summarization

Research on Tibetan Automatic Word Segmentation

عنوان ژورنال:

اشتراک گذاری